ChemEx: information extraction system for chemical data curation
Identifieur interne : 000268 ( Main/Exploration ); précédent : 000267; suivant : 000269ChemEx: information extraction system for chemical data curation
Auteurs : Atima Tharatipyakul [Thaïlande] ; Somrak Numnark [Thaïlande] ; Duangdao Wichadakul [Thaïlande] ; Supawadee Ingsriswang [Thaïlande]Source :
- BMC Bioinformatics [ 1471-2105 ] ; 2012.
Abstract
Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together.
We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests.
ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from
Url:
DOI: 10.1186/1471-2105-13-S17-S9
PubMed: 23282330
PubMed Central: 3521388
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000098
- to stream Pmc, to step Curation: 000098
- to stream Pmc, to step Checkpoint: 000111
- to stream Ncbi, to step Merge: 000150
- to stream Ncbi, to step Curation: 000150
- to stream Ncbi, to step Checkpoint: 000150
- to stream Main, to step Merge: 000271
- to stream Main, to step Curation: 000268
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">ChemEx: information extraction system for chemical data curation</title>
<author><name sortKey="Tharatipyakul, Atima" sort="Tharatipyakul, Atima" uniqKey="Tharatipyakul A" first="Atima" last="Tharatipyakul">Atima Tharatipyakul</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Numnark, Somrak" sort="Numnark, Somrak" uniqKey="Numnark S" first="Somrak" last="Numnark">Somrak Numnark</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wichadakul, Duangdao" sort="Wichadakul, Duangdao" uniqKey="Wichadakul D" first="Duangdao" last="Wichadakul">Duangdao Wichadakul</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ingsriswang, Supawadee" sort="Ingsriswang, Supawadee" uniqKey="Ingsriswang S" first="Supawadee" last="Ingsriswang">Supawadee Ingsriswang</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">23282330</idno>
<idno type="pmc">3521388</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521388</idno>
<idno type="RBID">PMC:3521388</idno>
<idno type="doi">10.1186/1471-2105-13-S17-S9</idno>
<date when="2012">2012</date>
<idno type="wicri:Area/Pmc/Corpus">000098</idno>
<idno type="wicri:Area/Pmc/Curation">000098</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000111</idno>
<idno type="wicri:Area/Ncbi/Merge">000150</idno>
<idno type="wicri:Area/Ncbi/Curation">000150</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000150</idno>
<idno type="wicri:Area/Main/Merge">000271</idno>
<idno type="wicri:Area/Main/Curation">000268</idno>
<idno type="wicri:Area/Main/Exploration">000268</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">ChemEx: information extraction system for chemical data curation</title>
<author><name sortKey="Tharatipyakul, Atima" sort="Tharatipyakul, Atima" uniqKey="Tharatipyakul A" first="Atima" last="Tharatipyakul">Atima Tharatipyakul</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Numnark, Somrak" sort="Numnark, Somrak" uniqKey="Numnark S" first="Somrak" last="Numnark">Somrak Numnark</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wichadakul, Duangdao" sort="Wichadakul, Duangdao" uniqKey="Wichadakul D" first="Duangdao" last="Wichadakul">Duangdao Wichadakul</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ingsriswang, Supawadee" sort="Ingsriswang, Supawadee" uniqKey="Ingsriswang S" first="Supawadee" last="Ingsriswang">Supawadee Ingsriswang</name>
<affiliation wicri:level="1"><nlm:aff id="I1">Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani, Thailand</nlm:aff>
<country xml:lang="fr">Thaïlande</country>
<wicri:regionArea>Information Systems Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Phaholyothin Road, Klong 1, Klong Luang, Pathumthani</wicri:regionArea>
<wicri:noRegion>Pathumthani</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint><date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together.</p>
</sec>
<sec><title>Results</title>
<p>We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests.</p>
</sec>
<sec><title>Conclusions</title>
<p>ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from <ext-link ext-link-type="uri" xlink:href="http://www.biotec.or.th/isl/ChemEx">http://www.biotec.or.th/isl/ChemEx</ext-link>
.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Bolton, Evan E" uniqKey="Bolton E">Evan E Bolton</name>
</author>
<author><name sortKey="Wang, Yanli" uniqKey="Wang Y">Yanli Wang</name>
</author>
<author><name sortKey="Thiessen, Paul A" uniqKey="Thiessen P">Paul A Thiessen</name>
</author>
<author><name sortKey="Bryant, Stephen H" uniqKey="Bryant S">Stephen H Bryant</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hanisch, D" uniqKey="Hanisch D">D Hanisch</name>
</author>
<author><name sortKey="Fundel, K" uniqKey="Fundel K">K Fundel</name>
</author>
<author><name sortKey="Mevissen, H T" uniqKey="Mevissen H">H-T Mevissen</name>
</author>
<author><name sortKey="Zimmer, R" uniqKey="Zimmer R">R Zimmer</name>
</author>
<author><name sortKey="Fluck, J" uniqKey="Fluck J">J Fluck</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Cohen, Am" uniqKey="Cohen A">AM Cohen</name>
</author>
<author><name sortKey="Hersh, Wr" uniqKey="Hersh W">WR Hersh</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Krallinger, M" uniqKey="Krallinger M">M Krallinger</name>
</author>
<author><name sortKey="Leitner, F" uniqKey="Leitner F">F Leitner</name>
</author>
<author><name sortKey="Rodriguez Penagos, C" uniqKey="Rodriguez Penagos C">C Rodriguez-Penagos</name>
</author>
<author><name sortKey="Valencia, A" uniqKey="Valencia A">A Valencia</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Mcdaniel, Jr" uniqKey="Mcdaniel J">JR McDaniel</name>
</author>
<author><name sortKey="Balmuth, Jr" uniqKey="Balmuth J">JR Balmuth</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ibison, P" uniqKey="Ibison P">P Ibison</name>
</author>
<author><name sortKey="Jacquot, M" uniqKey="Jacquot M">M Jacquot</name>
</author>
<author><name sortKey="Kam, F" uniqKey="Kam F">F Kam</name>
</author>
<author><name sortKey="Neville, Ag" uniqKey="Neville A">AG Neville</name>
</author>
<author><name sortKey="Simpson, Rw" uniqKey="Simpson R">RW Simpson</name>
</author>
<author><name sortKey="Tonnelier, C" uniqKey="Tonnelier C">C Tonnelier</name>
</author>
<author><name sortKey="Venczel, T" uniqKey="Venczel T">T Venczel</name>
</author>
<author><name sortKey="Johnson, Ap" uniqKey="Johnson A">AP Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Valko, At" uniqKey="Valko A">AT Valko</name>
</author>
<author><name sortKey="Johnson, Ap" uniqKey="Johnson A">AP Johnson</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Algorri, M E" uniqKey="Algorri M">M-E Algorri</name>
</author>
<author><name sortKey="Zimmermann, M" uniqKey="Zimmermann M">M Zimmermann</name>
</author>
<author><name sortKey="Friedrich, Cm" uniqKey="Friedrich C">CM Friedrich</name>
</author>
<author><name sortKey="Akle, S" uniqKey="Akle S">S Akle</name>
</author>
<author><name sortKey="Hofmann Apitius, M" uniqKey="Hofmann Apitius M">M Hofmann-Apitius</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Filippov, Iv" uniqKey="Filippov I">IV Filippov</name>
</author>
<author><name sortKey="Nicklaus, Mc" uniqKey="Nicklaus M">MC Nicklaus</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Park, J" uniqKey="Park J">J Park</name>
</author>
<author><name sortKey="Rosania, Gr" uniqKey="Rosania G">GR Rosania</name>
</author>
<author><name sortKey="Shedden, Ka" uniqKey="Shedden K">KA Shedden</name>
</author>
<author><name sortKey="Nguyen, M" uniqKey="Nguyen M">M Nguyen</name>
</author>
<author><name sortKey="Lyu, N" uniqKey="Lyu N">N Lyu</name>
</author>
<author><name sortKey="Saitou, K" uniqKey="Saitou K">K Saitou</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Klinger, R" uniqKey="Klinger R">R Klinger</name>
</author>
<author><name sortKey="Kola Ik, C" uniqKey="Kola Ik C">C Kolářik</name>
</author>
<author><name sortKey="Fluck, J" uniqKey="Fluck J">J Fluck</name>
</author>
<author><name sortKey="Hofmann Apitius, M" uniqKey="Hofmann Apitius M">M Hofmann-Apitius</name>
</author>
<author><name sortKey="Friedrich, Cm" uniqKey="Friedrich C">CM Friedrich</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Sun, B" uniqKey="Sun B">B Sun</name>
</author>
<author><name sortKey="Tan, Q" uniqKey="Tan Q">Q Tan</name>
</author>
<author><name sortKey="Mitra, P" uniqKey="Mitra P">P Mitra</name>
</author>
<author><name sortKey="Giles, Cl" uniqKey="Giles C">CL Giles</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hamon, T" uniqKey="Hamon T">T Hamon</name>
</author>
<author><name sortKey="Grabar, N" uniqKey="Grabar N">N Grabar</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Yan, S" uniqKey="Yan S">S Yan</name>
</author>
<author><name sortKey="Spangler, Ws" uniqKey="Spangler W">WS Spangler</name>
</author>
<author><name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Newman, Dj" uniqKey="Newman D">DJ Newman</name>
</author>
<author><name sortKey="Cragg, Gm" uniqKey="Cragg G">GM Cragg</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Jessop, D" uniqKey="Jessop D">D Jessop</name>
</author>
<author><name sortKey="Adams, S" uniqKey="Adams S">S Adams</name>
</author>
<author><name sortKey="Willighagen, E" uniqKey="Willighagen E">E Willighagen</name>
</author>
<author><name sortKey="Hawizy, L" uniqKey="Hawizy L">L Hawizy</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hawizy, L" uniqKey="Hawizy L">L Hawizy</name>
</author>
<author><name sortKey="Jessop, D" uniqKey="Jessop D">D Jessop</name>
</author>
<author><name sortKey="Adams, N" uniqKey="Adams N">N Adams</name>
</author>
<author><name sortKey="Murray Rust, P" uniqKey="Murray Rust P">P Murray-Rust</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Corbett, P" uniqKey="Corbett P">P Corbett</name>
</author>
<author><name sortKey="Copestake, A" uniqKey="Copestake A">A Copestake</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Degtyarenko, K" uniqKey="Degtyarenko K">K Degtyarenko</name>
</author>
<author><name sortKey="De Matos, P" uniqKey="De Matos P">P de Matos</name>
</author>
<author><name sortKey="Ennis, M" uniqKey="Ennis M">M Ennis</name>
</author>
<author><name sortKey="Hastings, J" uniqKey="Hastings J">J Hastings</name>
</author>
<author><name sortKey="Zbinden, M" uniqKey="Zbinden M">M Zbinden</name>
</author>
<author><name sortKey="Mcnaught, A" uniqKey="Mcnaught A">A McNaught</name>
</author>
<author><name sortKey="Alcantara, R" uniqKey="Alcantara R">R Alcantara</name>
</author>
<author><name sortKey="Darsow, M" uniqKey="Darsow M">M Darsow</name>
</author>
<author><name sortKey="Guedj, M" uniqKey="Guedj M">M Guedj</name>
</author>
<author><name sortKey="Ashburner, M" uniqKey="Ashburner M">M Ashburner</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Ingsriswang, S" uniqKey="Ingsriswang S">S Ingsriswang</name>
</author>
<author><name sortKey="Pacharawongsakda, E" uniqKey="Pacharawongsakda E">E Pacharawongsakda</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>Thaïlande</li>
</country>
</list>
<tree><country name="Thaïlande"><noRegion><name sortKey="Tharatipyakul, Atima" sort="Tharatipyakul, Atima" uniqKey="Tharatipyakul A" first="Atima" last="Tharatipyakul">Atima Tharatipyakul</name>
</noRegion>
<name sortKey="Ingsriswang, Supawadee" sort="Ingsriswang, Supawadee" uniqKey="Ingsriswang S" first="Supawadee" last="Ingsriswang">Supawadee Ingsriswang</name>
<name sortKey="Numnark, Somrak" sort="Numnark, Somrak" uniqKey="Numnark S" first="Somrak" last="Numnark">Somrak Numnark</name>
<name sortKey="Wichadakul, Duangdao" sort="Wichadakul, Duangdao" uniqKey="Wichadakul D" first="Duangdao" last="Wichadakul">Duangdao Wichadakul</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000268 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000268 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:3521388 |texte= ChemEx: information extraction system for chemical data curation }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:23282330" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a OcrV1
This area was generated with Dilib version V0.6.32. |